Skip to content

[API] Add DeepOCR pipeline API provider#1473

Open
leejooan wants to merge 2 commits intoopen-compass:mainfrom
leejooan:feat/DEEPOCR-api-provider
Open

[API] Add DeepOCR pipeline API provider#1473
leejooan wants to merge 2 commits intoopen-compass:mainfrom
leejooan:feat/DEEPOCR-api-provider

Conversation

@leejooan
Copy link
Copy Markdown

@leejooan leejooan commented Mar 4, 2026

Summary

Add support for DeepOCR pipeline so VLMEvalKit can run evaluations
using DeepOCR's document processing pipeline via an OpenAI-compatible
chat completions API.

The DeepOCR pipeline combines deep document OCR with a large
vision-language model, making it especially strong on document
understanding and text recognition tasks.

OCRBench result: 91.7 / 100

Changes

  • vlmeval/api/deepocr_api.py: New DeepOCRAPI class.
    Uses DEEPOCR_API_BASE with Bearer DEEPOCR_API_KEY.
  • vlmeval/api/__init__.py: Export DeepOCRAPI.
  • vlmeval/config.py: New model entry:
    • DEEPOCR

Adds `DeepOCRAPI`, an OpenAI-compatible wrapper for the DeepOCR
pipeline. Credentials are configured via environment variables
`DEEPOCR_API_BASE` and `DEEPOCR_API_KEY`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@leejooan
Copy link
Copy Markdown
Author

leejooan commented Mar 4, 2026

Hi,

I’ve emailed the OpenCompass team (opencompass@pjlab.org.cn) with the environment
variables required to run this provider.

If you need any additional setup or have questions, please feel free to reach out
at lja@koreadeep.com.

We look forward to your review.


def __init__(
self,
model: str = "gpt-4-1106-vision-preview",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model name is 'gpt-4-1106-vision-preview'? better to use your own name, becasue it's related to the result file name.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model name is 'gpt-4-1106-vision-preview'? better to use your own name, becasue it's related to the result file name.

Thanks for the feedback! I've updated the default model name from
"gpt-4-1106-vision-preview" to "deepocr" in the latest commit.

The previous name was only a placeholder to indicate OpenAI-compatible
format support. Using "deepocr" is more appropriate as the actual
model identifier and will align with the generated result/output names.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants